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Abstract 



We examined what determines the designability of 2-letter codes (H and P) 
lattice proteins from three points of view. First, whether the native structure 
is searched within all possible structures or within maximally compact struc- 
tures. Second, whether the structure of the used lattice is bipartite or not. 
Third, the effect of the length of the chain, namely, the number of monomers 
on the chain. We found that the bipartiteness of the lattice structure is not 
a main factor which determines the designability. Our results suggest that 
highly designable structures will be found when the length of the chain is 
sufficiently long to make the hydrophobic core consisting of enough number 
of monomers. 
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INTRODUCTION 



Natural proteins fold into unique compact structures in spite of the huge number of 
possible conformations For most single domain proteins, each of these native structures 
corresponds to the global minimum of the free energy 0. 

It has been proposed phenomenologically that the number of possible structures of nat- 
ural proteins is only about one thousand |J, which suggests that many sequences can fold 
into one preferred structure. There have been theoretical studies for the existence of such 
preferred structures [J3]-@]. 

In many of theoretical studies for the protein folding, a simplified model called HP model 
P]j5|]7|^12| is adopted. HP model is one of 2-letter codes lattice models where a protein is 
represented by a self-avoiding chain of beads placed on a lattice, with two types of beads, 
hydrophobic (H) and polar(P). In the HP model, the energy of a structure is given by the 
nearest-neighbor topological contact interactions as 

H = -J2E^ 3 A(r t -r,) (1) 

i<j 

where % and j are monomer indexes, {<7j} are monomer types (a = H or P); A(r*j — rj) = 1 
if rj and r,- are topological nearest neighbors not along the sequence, and A(rj — r,-) = 
otherwise. 

Based on the HP model, a concept of designability has recently been introduced Q; the 
number of sequences that have a given structure as their non-degenerate ground state (native 
state) is called the designability of this structure. When many sequences have a common 
native structure, one say that the structure is highly designable. Adding to the importance 
in the protein design problem, the designability also have evolutional significance because 
highly designable structures are found to be relatively stable against mutations. 

In the original study of H. Li et al. HP models on the square and cubic lattices are 
employed, with the energy parameters in eq.(|l|) being E HH = —2.3, E HP = —1.0, E PP = 0.0. 
For each sequence, they calculated the energy over all maximally compact structures and 
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picked up the native structure. The results indicated that highly designable structures 
actually exist on both lattices. 

A. Irback and E. Sandelin studied the HP models on the square and triangular lattices || . 
They adopted different energy parameters from H. Li et al. Q], namely, Ehh — — 1 5 Ehp = 
Ep P = 0. In the calculation of the designability, they considered all the possible structures, 
not restricting to the maximally compact ones. For the square lattice, they confirmed 
the existence of the highly designable structures as in Ref. 0]. For the triangular lattice, 
however, no such structures were found. In addition to the nearest-neighbor topological 
contact interactions, they considered local interactions represented by the bend angle and 
calculated the designability. Indeed the local interactions reduced degeneracy (i.e., the 
number of sequences which have non-degenerate ground state increased) and made the 
designability higher. But they found that the designability on the square lattice was still 
much higher than that on the triangular lattice. They concluded that the difference in the 
designability for these two lattices are related to the even-odd problem, that is, whether the 
lattice structure is bipartite or not. 

Quite recently, H. Li et al. proposed a new model based on the HP model on the square 
lattice 0. In the model, the hydrophobic interaction is treated in such a way that the energy 
decreases if the hydrophobic residue is buried in the core. They justify this treatment in 
two reasons: (1) the hydrophobic force which is dominant in folding fl3l , |l~4|j originates from 
aversion of hydrophobic residues from water. (2) the Miyazawa-Jernigan matrix [Tj| contains 



a dominant hydrophobic interaction of the linear form E a p = h a + hp They took 



N 

H = -Y J s l h l (2) 

i=i 

where {hi} represent a sequence : hi — 1 if the i-th amino acid is H-type and hi — if it 
is P-type. And {sj} represent a structure : Sj = if the i-th amino acid is on the surface 
and Si — 1 if it is in the core. They calculated the designability over all maximally compact 
structures, whose result is consistent with their former study [Q [See Table. |IJ ] . 

In our view, there are many points to be explored further for the designability problem. 



First, since the structures of natural proteins are compact but not necessarily "maximally 
compact" in general, how can we justify the discussion where only the maximally compact 
structures are taken into account? Second, is it adequate to consider only nearest-neighbor 
interactions? Properties of a system with only nearest-neighbor interactions are directly 
affected by the lattice structure, in particular, whether the lattice is bipartite or not. Is 
it good, only from these facts, to conclude immediately that the absence of the highly 
designable structures on the triangular lattice should be ascribed to the even-odd problem 
associated with the triangular lattice? || One should discuss the problem on the triangular 
lattice by using a model like the one in Ref. where the interactions do not depend on the 
contact between monomers, hence, do not directly reflect the non-bipartiteness. 

Our aim of this paper is to examine the above points and clarify what determines the 
designability of protein structures. We use a new model with a 2-letter codes (H and P) on 
the square and triangular lattices and calculate the designability over all possible structures. 
In our model, based on Ref. |J, the energy increases if the hydrophobic residue is exposed 
to the solvent. We will call this model "solvation model". In brief, the solvation model is a 
2-letter codes lattice model where the hydrophobic force to form a core is dominant and the 
interactions do not directly reflect the bipartiteness. Using the solvation model and the HP 
model, we investigate model-independent properties of designability. 

THE MODELS 

In the solvation model, based on Ref. |J , a protein is represented by a self-avoiding chain 
of beads with two types H and P, placed on a lattice. A sequence is specified by a choice of 
monomer types at each position on the chain. 

We used two-dimensional lattice models because a computable length by numerical enu- 
meration of the full conformational space is limited (square lattice : 18, triangular and cubic 
lattices : 13). Even with this chain- length limitation, we can make a "hydrophobic core" in 
two dimensions, in contrast with the three-dimensional case. 
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A structure is specified by a set of coordinates for all the monomers and is mapped into 
the number of contacts with the solvent. In our model, the total energy is given in terms 
of the monomer-solvent interactions, and depends only on the number of contacts with the 
solvent: 

H = J2E Si h i (3) 

i=l 

where {hi} represent a sequence : hi = 1 if the z'-th monomer is the H-type and hi — if 
it is P-type. The variable s$ denotes the number of contacts with the solvent, for example, 
Si = {0,1,2,3} on the square lattice and Sj = {0,1,2,3,4,5} on the triangular lattice. In 
other words, Sj = means that the z'-th monomer is buried away from the solvent. We 
take £ = 0, E x = V2, E 2 = V7, E 3 = Vl3, E 4 = >/l9, E 5 = V23. That is, the possible 
minimum energy is zero. And these parameters are selected so that the larger the number 
of contacts with the solvent is, the more the degree of energy increase is; the hydrophobic 



residue is energetically unfavorable to be at the corner (T/],[T8| . Although the choice of these 
values is somewhat arbitrary, we have considered the following points: (1) these values 
should not increase too much rapidly with the increase in the number of contacts with 
the solvent, and (2) the way of choosing these values must not bring about nonessential 
accidental degeneracies (due to simple rational ratios between the parameters) |l~9|| . 

Using the model on the square and triangular lattices, we calculate the designability for 
all the 2 N sequences, where N is the number of monomers, by exact computer-enumeration 
method over the full conformational space. To get correct data, we exclude overcounting 
coming from redundant structures which are mutually related by rotation, reflection and 
reverse-labeling. 

On the basis of data obtained by the solvation model and the HP model, we examine 
what determines the designability from three points of view: (1) the effect of the search-space 
restriction, namely, the search within maximally compact structures (in this paper, we just 
used maximally compact structures as a simplest example of the search-space restriction, 
and we may consider other one, e.g., structures with the biggest core), (2) the effect of 



the lattice structure, namely, whether the lattice is bipartite or not (or, equivalently, the 
even-odd problem), (3) the effect of the number of monomers (or, the length of the chain). 

RESULTS AND DISCUSSION 

Let us now give results of calculations. 

(1) The effect of the search within maximally compact structures 

In Fig. [I], we show the designability calculated on the square lattice for iV = 16, using 
maximally compact structures. For comparison, in Fig. |2|, we show the designability of the 
same system without the search-space restriction (i.e., search over all possible structures). 
In both cases, there are some highly designable structures. However, these structures are 
not common to both cases. In Fig. |2|, the number of sequences that have native struc- 
tures is 8277, but the number of sequences that have maximally compact structures as 
native is only 1087 out of 8277. That is, most sequences that have native structures have 
non-maximally compact structures as native. The importance of non-maximally compact 
structures has also been pointed out for the HP model P,|20|-p3|. These facts imply that 
it is not good to calculate the designability over only maximally compact structures. Such 
calculation picking up a "native" structure out of maximally compact structures, is not cor- 
rect if the true native structure is non-maximally compact. Further, when the lowest-energy 
non-maximally compact structure and the lowest-energy maximally compact structure are 
degenerate, there is no native structure (native structure must be non-degenerate), but the 
restricted-search-space calculation gives a false result that there is a native (and maximally 
compact) structure. We should say that the designability calculated over only maximally 
compact structures may be erroneous. 

(2) The effect of the lattice structure: bipartite or non-bipartite 

In two previous studies using the HP model interactions of the system directly 

reflected whether the lattice is bipartite or not. Moreover the designability on the trian- 
gular lattice was calculated with the energy parameters in eq.([IJ) being E HH = — 1, E HP = 
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Epp = 0, which would cause accidental degeneracies. In their results, highly designable 
structures were not found for the triangular lattice. Also, it seemed that native structures 
are likely to contain the hydrophobic core where a group of hydrophobic monomers contact 
with each other; such contact can be made only if the distance between the monomers along 
the sequence is odd. Therefore, the bipartiteness has been thought to be a main source of 
the designability. |3J^,^|. If so, highly designable structures do not actually exist, i.e., the 
concept of designability itself could be meaningless. On the other hand, if such preferred 
structures should exist on the basis of the proposal by C. Chothia [[§, the use of the lattice 
model would be inadequate. Then, we used the solvation model, which does not directly 
reflect the bipartiteness, and calculated the designability on the square and triangular lat- 
tices. Besides, we also calculated the designability on the triangular lattice using the HP 
model, with the energy parameters being E HH = —2.3, E HP = —1.0, E PP = 0.0. 

In Table. we show the total number of sequences that have non-degenerate ground 
state (S n ) and the highest designabilities (D h ) on the triangular lattice for N = 13, obtained 
by using different interactions. This result shows that, even if we take different values of 
energy parameters, or even if we use the solvation model, the triangular lattice is still 
unfavorable for the designability although S n varies largely. On the other hand, for the 
square lattice, highly designable structures are found in the solvation model as well as in the 
HP model (Fig. 0). These results imply that the absence of the highly designable structures 
for the triangular lattice should not be ascribed to the even-odd problem (or, the non- 
bipartiteness), but to other reasons. The properties that highly designable structures are 
found on the square lattice and no such structures are found on the triangular lattice might 
be general in 2-letter codes lattice models where the hydrophobic force is dominant. 
(3) The effect of the number of monomers 

Then, why are the highly designable structures absent for the triangular lattice? Small- 
ness of number of monomers (in other words, the length of a chain is too short), may be 
a possible reason. Important object in the protein structure is the hydrophobic core which 
consists of buried monomers in no contact with the solvent. Recall that the limit of a 
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computable length by exact enumeration of the full conformational space on the triangular 
lattice is 13. The biggest core which we can make by using this limited length is the one 
which consists of only three monomers; the length is too short for the hydrophobic force 
to form a core. This monomer-number effect is also found on the square lattice. Consider 
the following conditions: at least ten sequences have a given structure as their native state, 
and at the same time, there are at least five such structures. Only if these conditions are 
satisfied, let us say that "there are highly designable structures." Then, at N = 10 or less, 
there are no highly designable structures even for the square lattice [Table. |TT|, Table. [TV 



This result implies that, when we discuss whether there are highly designable structures or 
not, we need a long chain enough to make a core of enough size. This further implies that, in 
three-dimensional case, we will need a chain of longer length than that in two-dimensional 
case to make a core. 

Let us see Table. |T|, Table. |TV| and Table. |V|. In Table. [V|, we show the designability 



calculated on the triangular lattice for iV = 13. On the square lattice for N = 10, the 
biggest core consists of two monomers. Both on the triangular lattice for iV = 13 and on 
the square lattice for N = 11, the biggest core consists of three monomers. We see that the 
triangular lattice is unfavorable for designability compared with square lattice, even when 
the biggest possible core size is same or a little larger. A possible reason would be the number 
of all possible structures, particularly the number of structures with the biggest core. As 
the length of a chain becomes long, the number of all possible structures increases almost 
exponentially as /x ( 2 < \x < 3 for the square lattice, and 4 < /i < 5 for the triangular 
lattice) p5| . On the triangular lattice for iV = 13, the number of all possible structures is 
6,279,601 and the number of structures with the biggest core is 4,110 out of them. On the 
other hand, on the square lattice for N = 10,11, the number of all possible structures is 
2,034, 5,513 and the number of structures with the biggest core is 23, 5, respectively. Thus 
the number of all possible structures and the number of structures with the biggest core on 



the triangular lattice are much larger than those on the square lattice [EBfl . In consequence, 



the degeneracy tends to grow, which is unfavorable for designability. In this view, designable 
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structures on the triangular lattice would be more difficult to appear than on the square 
lattice. 

SUMMARY 

We have calculated the designability of the protein structure using the solvation model 
and the HP model, to deduce model-independent properties of designability. The solvation 
model introduced in this paper satisfies two conditions: (l)the hydrophobic force is dom- 
inant, (2) the model does not directly reflect the bipartiteness. We have examined what 
determines the designability from three points of view: effect of restricted search within 
maximally compact structures, the bipartite/non-bipartite effect, the length of the chain. 

In result, we have found that it is inadequate to calculate the designability within max- 
imally compact structures. Our results imply that the reason why no highly designable 
structures on the triangular lattice have been found is not the non-bipartiteness. We sup- 
pose that the main factor which affects the designability is the chain length, because for 
sufficiently large hydrophobic core to form, long enough chains are required. Triangular 
lattice is more unfavorable for the designability than square lattice irrespective of models or 
energy parameters, probably because the number of all possible structures is large. However, 
if we can deal with longer chain than in the present study, it is possible that we find highly 
designable structures even on the triangular lattice. The calculations of the designability for 
longer chains on the triangular lattice are highly desirable. These conclusions would apply 
to a wide variety of 2-letter codes lattice models, where the hydrophobic force is dominant, 
regardless of energy parameters and further details of the model. 

Though a concept of designability is currently defined for a 2-letter codes lattice model, 
our final goal is to examine whether natural proteins have highly designable structures. 
Therefore it is an interesting problem to extend the study of the designability for a 20-letter 
codes model (e.g., MJ model ]nj, KGS model p7| ) and an off- lattice model. Substituting 
20-letter codes for 2-letter codes certainly reduces degeneracy, and most of all sequences 
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come to have a structure as non-degenerate ground state (i.e., native structure). 
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FIG. 1. The designability calculated over maximally compact structures on the square lattice 
for N=16. The term of "Number of structures" at the vertical axis means how many structures 
with the "Designability" at the horizontal axis there are. 
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FIG. 2. The designability calculated over all possible structures on the square lattice for N=16. 
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TABLES 





H. Li el al. § A. Irback and E. Sandelin § 


H. Li et al. @ 


lattice 


square and cubic square and triangular 


square 
depend on 


interaction 


nearest-neighbor 


the position of a H 


Hamiltonian 


H = -Y li<j E aia .A{r i -r j ) 


H = -Etis i h i 


energy parameter 








f-9 ^ -in on) f-i n o\ 




(Ehh,Ehp, Epp) 




conformational space 


maximally compact all 


maximally compact 


highly designable 


found on both lattices found on square lattice 


found 


structures 


but not found on triangular one 





TABLE I. The difference among three researches is showed. Each variable in the Hamiltonian 
is defined in the text. 
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HP model (-1, 0, 0) 
HP model (-2.3, -1.0, 0.0) 
solvation model 



129 3 
7 1 



TABLE II. S n and D h on the trian- 
gular lattice for N = 13. The parenthe- 
sis corresponds to energy parameters 
(E HH ,E HP ,Epp). The data in the HP 
model with the energy parameters be- 
ing Ehh = —1,Ehp = Epp = was 
obtained by A. Irback and E. Sandelin 
[5]. S n and are defined in the text. 



Designability 
1 
2 
3 
4 
5 
6 
10 
12 



Number of Structures 
1 
2 
4 
3 
4 
1 
2 
1 



TABLE III. The designability calculated 
over all possible structures on the square lat- 
tice for N = 10. The term of "Number of 
Structures" means how many structures with 
the "Designability" on its left there are. 
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Designability 


Number of Structures 


1 


5 


2 


11 


3 


4 


4 


1 


5 


3 


8 


1 


10 


1 


13 


1 




1 


29 


1 


36 


1 


43 


1 


TABLE IV. 


The designability calculated 


over all possible structures on the square lattice 


for N = 11. 




Designability 


Number of Structures 


1 


7 



TABLE V. The designability calculated 
over all possible structures on the triangular lat- 
tice for N = 13. 
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